GUIエージェント(Computer use)
LLMでPCを操作!?Claudeの新機能「computer use」を早速試してみた
Omniparser
OmniParser for pure vision-based GUI agent
Large Language Model-Brained GUI Agents: A Survey
Agent S: An Open Agentic Framework that Uses Computers Like a Human
OmniParser for Pure Vision Based GUI Agent
OS-Atlas: A Foundation Action Model For Generalist GUI Agents
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
LLM でブラウザを操作する WEB エージェントと周辺技術のざっくり紹介
BrowserGym
browsergym leader board